Automating the Analysis HIV Immunogen Antigenic Characteristics

(Alt Title: I'm getting lazy)

Michael Chambers

What I Do: A lotta quality control for HIV Immunogens

Objective: Automate the data analysis for these immunogens

My Goals:

Convert MSD output file to .csv
Break up the .csv into 8x12 arrays
Average column duplicates and create graph
Import raw data into Prism for further analysis

What I made:

A script that accomplishes ALMOST all of the above
A rudimentary module to easily manipulate the data from each plate

DEMO TIME!



In [1]:

    
#This is the MSD .txt output file I'll be working with:
data = open('data.txt')
data.read()









    Out[1]:





'==========Data==================================================================================================================================================================================================\n\t1\t2\t3\t4\t5\t6\t7\t8\t9\t10\t11\t12\nA\t963268\t965546\t934816\t965927\t9476\t9728\t75679\t75719\t58511\t57280\t189749\t186988\nB\t973919\t976761\t940526\t939458\t5534\t5500\t37403\t37522\t29628\t28250\t91300\t94668\nC\t907480\t912649\t875027\t873084\t3671\t3950\t20350\t20372\t16409\t16225\t51356\t51729\nD\t756367\t762278\t723299\t722686\t3134\t2984\t10568\t10621\t8535\t8338\t26760\t27257\nE\t465322\t468026\t429960\t437096\t2602\t331\t5604\t5873\t4866\t4973\t14761\t14776\nF\t244089\t250824\t234123\t237409\t2071\t2177\t2944\t3083\t2874\t2834\t7214\t7122\nG\t134690\t139305\t128528\t133833\t2030\t2035\t1663\t1759\t1688\t1724\t3949\t3798\nH\t86\t89\t165\t207\t1621\t1640\t97\t100\t187\t568\t126\t126\n\n==========Data==================================================================================================================================================================================================\n\t1\t2\t3\t4\t5\t6\t7\t8\t9\t10\t11\t12\nA\t146106\t148267\t1008\t1007\t1112\t1183\t2666\t2648\t3053\t3033\t4916\t4792\nB\t109212\t107029\t557\t623\t1056\t1051\t1322\t1344\t1690\t1744\t2362\t2405\nC\t71341\t67961\t404\t402\t1129\t1331\t762\t783\t1130\t1140\t1313\t1286\nD\t38312\t35936\t232\t319\t1323\t1192\t418\t453\t826\t825\t718\t750\nE\t16283\t16196\t277\t285\t1205\t1254\t267\t287\t667\t670\t439\t446\nF\t6311\t6260\t249\t258\t1336\t1260\t185\t188\t595\t597\t268\t274\nG\t3023\t2910\t256\t260\t1399\t1335\t149\t155\t532\t560\t205\t200\nH\t155\t160\t228\t254\t1319\t1259\t95\t99\t500\t522\t127\t130\n'

(1) Demo Script: msd_script.py

(2) Demo Module: msd_module.py



In [2]:

    
#Import msd_module
%matplotlib inline
import msd_module









    



msd_module imported!



In [3]:

    
#Check Docstring
msd_module?



In [4]:

    
project1 = msd_module.msd_96()









    



Input date (e.g. yymmdd): 160511
Input project name (e.g. my_project): Project2



In [6]:

    
project1.create_df('data.txt')









    



df created!



In [7]:

    
project1.df



In [8]:

    
project1.split_plates()









    



Plate#_1
  row  dilution         1         2       3        4        5         6
0   A  5.000000  964407.0  950371.5  9602.0  75699.0  57895.5  188368.5
1   B  2.500000  975340.0  939992.0  5517.0  37462.5  28939.0   92984.0
2   C  1.250000  910064.5  874055.5  3810.5  20361.0  16317.0   51542.5
3   D  0.625000  759322.5  722992.5  3059.0  10594.5   8436.5   27008.5
4   E  0.312500  466674.0  433528.0  1466.5   5738.5   4919.5   14768.5
5   F  0.156250  247456.5  235766.0  2124.0   3013.5   2854.0    7168.0
6   G  0.078125  136997.5  131180.5  2032.5   1711.0   1706.0    3873.5
7   H  0.039062      87.5     186.0  1630.5     98.5    377.5     126.0
Axes(0.125,0.125;0.775x0.775)
Plate#_2
   row  dilution         1       2       3       4       5       6
8    A  5.000000  147186.5  1007.5  1147.5  2657.0  3043.0  4854.0
9    B  2.500000  108120.5   590.0  1053.5  1333.0  1717.0  2383.5
10   C  1.250000   69651.0   403.0  1230.0   772.5  1135.0  1299.5
11   D  0.625000   37124.0   275.5  1257.5   435.5   825.5   734.0
12   E  0.312500   16239.5   281.0  1229.5   277.0   668.5   442.5
13   F  0.156250    6285.5   253.5  1298.0   186.5   596.0   271.0
14   G  0.078125    2966.5   258.0  1367.0   152.0   546.0   202.5
15   H  0.039062     157.5   241.0  1289.0    97.0   511.0   128.5
Axes(0.125,0.125;0.775x0.775)



In [9]:

    
project1.dilution









    Out[9]:





[5, 2.5, 1.25, 0.625, 0.3125, 0.15625, 0.078125, 0.0390625]



In [10]:

    
project1.create_dilution(5,3,8)



In [11]:

    
project1.dilution









    Out[11]:





[5,
 1.6666666666666667,
 0.5555555555555556,
 0.1851851851851852,
 0.0617283950617284,
 0.0205761316872428,
 0.006858710562414266,
 0.0022862368541380885]



In [12]:

    
project1.split_plates()









    



Plate#_1
  row  dilution         1         2       3        4        5         6
0   A  5.000000  964407.0  950371.5  9602.0  75699.0  57895.5  188368.5
1   B  1.666667  975340.0  939992.0  5517.0  37462.5  28939.0   92984.0
2   C  0.555556  910064.5  874055.5  3810.5  20361.0  16317.0   51542.5
3   D  0.185185  759322.5  722992.5  3059.0  10594.5   8436.5   27008.5
4   E  0.061728  466674.0  433528.0  1466.5   5738.5   4919.5   14768.5
5   F  0.020576  247456.5  235766.0  2124.0   3013.5   2854.0    7168.0
6   G  0.006859  136997.5  131180.5  2032.5   1711.0   1706.0    3873.5
7   H  0.002286      87.5     186.0  1630.5     98.5    377.5     126.0
Axes(0.125,0.125;0.775x0.775)
Plate#_2
   row  dilution         1       2       3       4       5       6
8    A  5.000000  147186.5  1007.5  1147.5  2657.0  3043.0  4854.0
9    B  1.666667  108120.5   590.0  1053.5  1333.0  1717.0  2383.5
10   C  0.555556   69651.0   403.0  1230.0   772.5  1135.0  1299.5
11   D  0.185185   37124.0   275.5  1257.5   435.5   825.5   734.0
12   E  0.061728   16239.5   281.0  1229.5   277.0   668.5   442.5
13   F  0.020576    6285.5   253.5  1298.0   186.5   596.0   271.0
14   G  0.006859    2966.5   258.0  1367.0   152.0   546.0   202.5
15   H  0.002286     157.5   241.0  1289.0    97.0   511.0   128.5
Axes(0.125,0.125;0.775x0.775)

Automating the Analysis HIV Immunogen Antigenic Characteristics

(Alt Title: I'm getting lazy)

Michael Chambers

What I Do: A lotta quality control for HIV Immunogens

Objective: Automate the data analysis for these immunogens

My Goals:

What I made:

DEMO TIME!

(1) Demo Script: msd_script.py

(2) Demo Module: msd_module.py

That's it!

So what's coming in msd_module VERSION 2.0!!!

(more like version 0.0.2)

Is it on GitHub? HELL YEAH!!!

https://github.com/greenkidneybean/MSD_Module

How much time do these tools save me?

For right now I'll just say I'm in the red.

Cheers!

~mc

	Rows	1	2	3	4	5	6	7	8	9	10	11	12
0	A	963268	965546	934816	965927	9476	9728	75679	75719	58511	57280	189749	186988
1	B	973919	976761	940526	939458	5534	5500	37403	37522	29628	28250	91300	94668
2	C	907480	912649	875027	873084	3671	3950	20350	20372	16409	16225	51356	51729
3	D	756367	762278	723299	722686	3134	2984	10568	10621	8535	8338	26760	27257
4	E	465322	468026	429960	437096	2602	331	5604	5873	4866	4973	14761	14776
5	F	244089	250824	234123	237409	2071	2177	2944	3083	2874	2834	7214	7122
6	G	134690	139305	128528	133833	2030	2035	1663	1759	1688	1724	3949	3798
7	H	86	89	165	207	1621	1640	97	100	187	568	126	126
8	A	146106	148267	1008	1007	1112	1183	2666	2648	3053	3033	4916	4792
9	B	109212	107029	557	623	1056	1051	1322	1344	1690	1744	2362	2405
10	C	71341	67961	404	402	1129	1331	762	783	1130	1140	1313	1286
11	D	38312	35936	232	319	1323	1192	418	453	826	825	718	750
12	E	16283	16196	277	285	1205	1254	267	287	667	670	439	446
13	F	6311	6260	249	258	1336	1260	185	188	595	597	268	274
14	G	3023	2910	256	260	1399	1335	149	155	532	560	205	200
15	H	155	160	228	254	1319	1259	95	99	500	522	127	130